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ABSTRACT 

In cyber-physical systems such as automobiles, measurement 
data from sensor nodes should be delivered to other con¬ 
sumer nodes such as actuators in a regular fashion. But, in 
practical systems over unreliable media such as wireless, it 
is a significant challenge to guarantee small enough inter¬ 
delivery times for different clients with heterogeneous chan¬ 
nel conditions and inter-delivery requirements. In this pa¬ 
per, we design scheduling policies aiming at satisfying the 
inter-delivery requirements of such clients. We formulate the 
problem as a risk-sensitive Markov Decision Process (MDP). 
Although the resulting problem involves an infinite state 
space, we first prove that there is an equivalent MDP in¬ 
volving only a finite number of states. Then we prove the 
existence of a stationary optimal policy and establish an al¬ 
gorithm to compute it in a finite number of steps. 

However, the bane of this and many similar problems is 
the resulting complexity, and, in an attempt to make fun¬ 
damental progress, we further propose a new high reliabil¬ 
ity asymptotic approach. In essence, this approach consid¬ 
ers the scenario when the channel failure probabilities for 
different clients are of the same order, and asymptotically 
approach zero. We thus proceed to determine the asymp¬ 
totically optimal policy: in a two-client scenario, we show 
that the asymptotically optimal policy is a “modified least 
time-to-go” policy, which is intuitively appealing and easily 
implementable; in the general multi-client scenario, we are 
led to an SN policy, and we develop an algorithm of low 
computational complexity to obtain it. Simulation results 
show that the resulting policies perform well even in the 
pre-asymptotic regime with moderate failure probabilities. 


Categories and Subject Descriptors 

C.2.1 [Network Architecture and Design]: Wireless Com¬ 
munication 


Permission to make digital or hard copies of all or part of this work for personal or 
classroom use is granted without fee provided that copies are not made or distributed 
for profit or commercial advantage and that copies bear this notice and the full cita¬ 
tion on the first page. Copyrights for components of this work owned by others than 
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re¬ 
publish, to post on servers or to redistribute to lists, requires prior specific permission 
and/or a fee. Request permissions from permissions@acm.org. 

MobiHoc’ 15, June 22-25, 2015, Hangzhou, China. 

Copyright © 2015 ACM 978-1-4503-3489-1/15/06 ...$15.00. 
http://dx.doi.org/10.1145/2746285.2746305 


Actuators 


Sensor 1 f 
(Temperature) 


Sensor 2 
(Pressure) 


Sensor N 
(Location ..;) 


Figure 1: An in-vehicular network with an access 
point and several wirelessly connected sensors and 
actuators. 
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1. INTRODUCTION 

Delay and throughput have long been regarded as im¬ 
portant quality of service (QoS) metrics [9][lT 25 . How¬ 
ever, with the increasing deployment of real-time applica¬ 
tions such as sensor networks, and surveillance applications 
over unreliable media such as wireless, guaranteeing small 
enough inter-delivery times between packets becomes impor¬ 
tant As an example, consider an in-vehicle 

wireless sensor network illustrated in Fig. [l] In-vehicle wire¬ 
less sensor networks have been drawing increasing attention 
recently since they can significantly reduce the costs, reduce 
weight of the wiring harness and hence increase fuel effi¬ 
ciency, and are extensible and scalable, as compared to the 
wired in-vehicle networks [6]|24]. Such a cyber-physical sys¬ 
tem features several (about a hundred) wireless sensor nodes 
monitoring processes such as temperature and pressure, and 
continually transmitting their measurements to controllers 
which then choose appropriate actuation signals. In these 
systems, one is allowed to control the arrival process since 
outdated packets containing old sensor measurements can be 
replaced by newer packets. Since a large gap between up¬ 
dates can lead to system instability, inter-delivery times of 
these packets is an important QoS metric. Different clients 
may have different channel conditions and inter-delivery re¬ 
quirements, which further complicates the problem. 

In this paper, our goal is to design scheduling policies that 
decide which client’s packet to transmit in each time slot in 
such systems, so as to guarantee small enough inter-delivery 
times between packets. 

To penalize severely the deviations in inter-delivery times 
that are larger than a certain threshold, we consider the 















exponential cost function, 
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exp ( 0 ■ - T n ) H 


where 6 > 0 is a risk-aversion parameter. Here, D l ' n ' > is the 
inter-delivery time of client n, r„ is a specified inter-delivery 
threshold for client n, and (a) + := max{0, a}. 

We formulate the optimization problem as a risk-sensitive 
Markov Decision Process (MDP) |4|5|7|11|12|16| . Though it 
is over an infinite state space, we show that there is an equiv¬ 
alent MDP that involves only a finite state space. For this 
equivalent MDP, we then prove the existence of a stationary 
optimal policy, and obtain an algorithm which determines 
the optimal policy in a finite number of steps. 

The significant challenge of this and other modeling efforts 
is however the complexity of determining an optimal solution 
to the MDP, which is excessively large even for systems with 
a moderate number of states (e.g., a hundred nodes), as is of 
interest in many applications. To address this critical chal¬ 
lenge, we further propose a new approach to channel mod¬ 
eling which we call the “high reliability asymptotic regime”. 
In essence, this approach considers the scenario when the 
channel failure probabilities for different clients are of the 
same order, and asymptotically approach zero. We then pro¬ 
ceed to determine the asymptotically optimal policy. Such 
a policy is expected to provide near optimal performance 
even when the channel failure probabilities are non-zero and 
range from small to moderate values. Philosophically, this 
approach can be regarded as similar to studying the high 
SNR asymptotics in network information theory [ 5 ]. 

In the case where there are two clients, the asymptoti¬ 
cally policy has a very appealing structure, lending support 
to this approach. The asymptotically optimal policy is a 
structurally clean “modified-least-time-to-go” policy, which 
is both intuitively appealing and easily implementable. 

Our interest, motivated by cyber-physical systems appli¬ 
cations, is however in large systems with several sensors and 
actuators. For this general multi-client scenario, the asymp¬ 
totic approach leads to an SN policy, and an algorithm of 
relatively low computational complexity to obtain it. The 
success of such an asymptotic approach however depends on 
its performance in the pre-asymptotic regime. We present 
simulation results showing that this is indeed the case. 

The rest of the paper is organized as follows. We review 
the related works in Section 2. We present the system model 
in Section 3. We formulate the problem as a risk-sensitive 
MDP in Section 4, and reduce it to a finite state problem in 
Section 5. We prove the existence of a stationary optimal 
policy, and apply the classic risk-sensitive MDP approach in 
Section 6. We propose the high reliability asymptotic ap¬ 
proach in Section 7. In Section 8, we design asymptotically 
optimal policies by analyzing the high reliability asymptote. 
Simulation results are presented in Section 9, followed by 
conclusions in Section 10. 


2. RELATED WORK 

Li, Eryilmaz and Li 15] and Li, Li and Eryilmaz 14] are 
apparently the first to consider inter-delivery time as a per¬ 
formance metric. These works analyzed this metric in the 
context of queueing systems, where the relevant trade-off is 
between stabilizing the queues and minimizing the sum of 
the inter-delivery times over all the clients. However, the 


situation is very different in wireless sensor networks where 
packets contain sensor measurements and one can simply 
replace older packets by newer packets, and thus resulting 
in no queues. Sadi and Ergen [5T have pointed out the 
periodic nature of sensor nodes in intra-vehicular wireless 
sensor networks. In another relevant work, Singh, Guo, and 
Kumar [22] have addressed the issue of trading off higher 
throughput for better performance with respect to variations 
in the inter-delivery times. However, this work does not al¬ 
low tunable and heterogeneous inter-delivery requirements, 
and a buffer is maintained so as to mitigate the influence 
of variations in inter-delivery times. Guo, Singh, Kumar, 
and Niu [8 have further combined the inter-delivery time 
requirement with the system energy-efficiency. Singh and 
Stolyar [23] have shown that the service process under the 
Max Weight scheduling is asymptotically smooth. 

The study of risk-sensitive MDPs dates back to Howard 
and Matheson [II]. There has been considerable work on 
proving the existence of stationary optimal policy in differ¬ 
ent conditions [4],7,16]. However, Chung and Sobel [ 5 ] have 
pointed out that, in contrast to the risk-neutral MDPs, even 
with a discounted cost, a stationary optimal policy need not 
exist for a general risk-sensitive MDP. This introduces an ad¬ 
ditional challenge to our study. In wireless communications, 
Altman et al 1 have applied risk-sensitive MDP techniques 
to design power control strategies aiming at minimizing the 
delivery failure probability in delay tolerant networks. 

Avestimehr et al [2] have proposed a deterministic chan¬ 
nel model to study the high SNR asymptotics in the field of 
network information theory, thus obtaining constant gap ap¬ 
proximations to the capacity of wireless networks. This has 
led to near-optimal and easy-to-implement communication 
schemes for Gaussian relay networks. Kittipiyakul et al [13] 
and Zhang et al 26] have also employed such an asymptotic 
approach to analyze error performance in fading channels. 


3. SYSTEM MODEL 

Consider a system with N wireless sensors and one access 
point (AP). Time is discretized into slots. The AP broad¬ 
casts a control message at the beginning of each time-slot 
to announce which sensor can transmit in the slot. The as¬ 
signed sensor then transmits a packet. The size of a time 
slot is the time required for the AP to send the control mes¬ 
sage plus the time for a client to prepare and transmit a 
packet. It is assumed that the wireless channel connecting 
sensor n and the AP has a channel reliability of p n £ (0,1), 
which can be taken to be the probability that the control 
message from the AP and the transmission from client n are 
both successful. The system model can be generalized to 
take into account more general fading models. 

The QoS requirement for client n is modeled through a 
specified value for the inter-delivery threshold r„. The cost 
incurred by the system in T time-steps is modeled as, 
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(1) 


where D\ rL ' 1 is the time between the (i— l)-th and i-th packet 
deliveries of client n, M^ l ' > is the number of packets deliv¬ 
ered for the n-th client by time T, t („) is the time slot in 

which the i-th packet for client n is delivered, and (a) + := 
max{a, 0}. The last term is included since, otherwise, the 








policy of never making any transmission at all will result in 
the least cost. The parameter 9 > 0 leads to a risk-aversion 
problem. The goal of the AP is to decide the client to trans¬ 
mit in each time slot, in order to minimize the above cost. 

4. PROBLEM FORMULATION 


Lemma 1. For MDP-1, the following results hold: 

1) For all n € {1, • • • , A'"}, and Vaq, ■ • ■ ,xi v > 0, 

V T (x i,--- ,a:„ + r„, • ■ • ,xn) 

= exp (9x n ) ■ Vt(xi, ■■■ , t„, • • • ,x N ). (5) 


We first describe the notations used: Vectors will be in 
bold font, e.g., r := (n,..., tn) and x := (xi, • • ■ , xn). De¬ 
fine a n Ab n := min {a n ,b„}, and aAb := (ai A 6i, ..., ajv A 6 at) 
The system state at time t is denoted as 

X(t) := (Xr(t X N (t)), 

where X n (t) is the time elapsed since the most recent packet 
delivered by client n. Thus, the state space is {0,1, • • ■ } N , 
which is finite for the finite time horizon problem, but ex¬ 
ponentially growing to infinity as the horizon increases. Let 
control U(t) denote the client transmitting in time slot t. 
The system state evolves as, 


X n (t + 1) 


0 if a packet is delivered for client n in slot t, 
X n (t) + 1 otherwise . 


As a consequence, the system forms a controlled Markov 
Chain, with transition probabilities, 

P[X(t+l) = y\X(t) = x,U(t)=u] 

(p u if y = (®i + l,- ■ aj ^—1 + 1,0, Xu+i + l,- ■ ■, a:jv + l), 
= ll-Pu if y = x + 1, 

I 0 otherwise, 


where 1 := (1, • • ■ ,1). The T-horizon optimal cost-to-go 
from initial state x is given by, 


Vr(x) := minE,, 


exp 




t =0 n=1 

l{X„(t + 1) = 0} J Ly( 0) = x 


( 2 ) 


where !{■} is the indicator function, and assuming X (T) := 
0 so as to recover the last term in the cost 0 - Here, the min¬ 
imization is taken over all history-dependent scheduling poli¬ 
cies 7r. Our goal is to design an optimal history-dependent 
policy that achieves the optimal cost-to-go Vt(x) for any 
initial state x. 

Later in Section [6] equation ( |1 1[ ) , we will consider the 
infinite horizon cost J(7r,x) when T —> oo. 


Further, the optimal controls in the two states, 

(xi,- ■ ■ ,X„+T n , ■ • • , Xn) and (xi, ■ ■ ■ , r„, ■ • • , xn), are 
the same. 

2) The optimal cost function starting with any system state 
x such that x„ < r n ,Vn satisfies: 

N 

Vt(x) =exp^(9^1{a: n = minjp„V' T -i (S„(x)At) 

n =1 

+ (1 -pu)V T -i ((x+ 1) A t) |, (6) 

where S u (x) is as in 0: 

3) Y(t) := X(t) A t is a Markov Decision Process, i.e., 

P[Y(t + l)\Y(t),--- ,Y(0),U(t),--- ,1/(0)] 

=p[y(t + i)|y(t),E/(t)]. 

Proof. The proof is omitted due to space constraints. □ 


Now, we construct a new MDP-2, and show in Theorem [l] 
that it is equivalent to MDP-1 in an appropriate sense. By 
slightly abusing notation, we still use the symbols Y(t) and 
U{t) for state and client-to-transmit. 

Let us associate a state variable Y n (t) with each client n, 
with F n (0) 6 {0, 1 , • • • , Tn } , which evolves as, 

J 0 if a packet delivered for client n in slot t, 

1 ( Y n (t ) + 1) A T n otherwise . 


Then the system state space is ¥ := {0,1, • • • ,r TO }, 

which is finite, even for the infinite time horizon problem. 
The transition probabilities of the process 
Y(t) := (Yr(t), ■ ■ ■ , Y N (t)) depend on the control U(t) as, 


p[y(t + i) = y|y(f) = x,f/(f) = «] 


Pu 

1 - Pu 
0 


if y = 5„(x) A r, 
if y = (x + 1) A r, 
otherwise, 


( 7 ) 


5. REDUCTION TO FINITE STATE PROB¬ 
LEM 

We denote the problem in Section [4] as MDP-1. We now 
show that it is equivalent to another finite-state problem. 

It directly follows from 0 that the DP recursive relation¬ 
ship of the optimal cost-to-go functions in MDP-1 is: 

Vt(x) = min {p n exp (9 {x n + l- r n ) + ) V T -i(S„(x.)) 

+ (l-Pn)Vr-i (x+ 1) }, (3) 

where 

S„(x) := (£1 + 1,■ • ■, Xn—l + 1,0, Xn+l + 1,* • ',25v + l), (4) 


where <S u (x) is as in 0. 

We associate the following cost to the system with starting 
state x £ ¥, when policy 7r is applied: 

T -1 N 

VI (x) = E^exp (9J2 HMt) = Tn})|y(0) = x]. (8) 

t =0 n=l 

The optimal cost-to-go function is, 

Vt(x) ~ min Vf (x),Vx £ Y. (9) 

7T 

Here, the superscript tilde is to distinguish it from the opti¬ 
mal cost for the MDP-1 in j2j. 


i.e., the state that succeeds the state x in the event of a Theorem 1. The MDP-2 is equivalent to MDP-1 in the 

successful transmission for client n. following senses: 




1) The optimal cost-to-go functions of the two MDPs are 
equal in each time slot t for any starting state x such 
that x n < T n ,Vn, i.e., 

Vt(x) = Vt(x),Vx G ¥; 

2) Any optimal control for MDP-1 in state x is also optimal 
for MDP-2 in state x A r, and conversely. 

Proof. The DP recursion for the optimal cost in MDP-2 is 

N 

Vrfx) = exp (e^2,t{x n = t„}) miiijy^P,. (x, y) V T -i (y)|, (10) 

n =1 y 

where P„(x,y) := P[Y(t + 1) = y|V'(t] = x. U{t) = u]. 
Recalling (7}, we note that the r.h.s. of © and the r.h.s. of 
(TO} evolve in exactly the same way. Thus, the optimal cost 
for MDP-1, i.e., Vr(x), and the optimal cost for MDP-2, 
i.e., Vt(x), have identical recursive relationships for x G Y. 
Consequently, statement 1) follows. 

In addition, due to the identical recursive relationships, 
the optimal controls at any state x G Y for the two systems 
are also identical. Combining this with the first statement 
of Lemma [l] we obtain statement 2). □ 

6. THE RISK-SENSITIVE APPROACH 

The great advantage of the equivalent MDP-2 is that its 
state space is finite. Following Theorem [T| we focus exclu¬ 
sively on MDP-2 in the following. 

To consider long-term operation, we define the (risk-sensitive 
infinite horizon) average cost under policy n starting at state 
x, 


2) For any transient state y, we have, (P^)y,y = 0. 

3) Assuming that there is only one communicating class 
(and thus non-transient) when / is applied, we have, 

J(/,y) = P(L / ), Vy. (12) 


Proof. Since p n < 1 for each client n, there is a positive 
probability that there will be no packet deliveries for r max := 
max^Li r n time slots, and so system state (n,--- ,tjv) is 
reachable from any state. 

We now prove 2). From 0 , we have, (P^y.y 7 ^ 0 implies 
either y = (n, • • • , tjv), or 3n and y such that /( y) = n and 
y n = 0, yi = ti,MI ^ n. However, the state (n,... ,Tn) in 
the former case is non-transient, while the latter condition 
is ruled out because of the property of the NE policy. 

Statement 3) is proved by matrix analysis techniques, and 
the proof is omitted here due to space constraints. □ 


In the following, we assume that for any NE policy, there 
is only one self-communicating class. This assumption is 
not restrictive considering the first statement in Lemma [2] 
since we at least can simply restrict the state space to the 
one non-transient communicating class. (In addition, by the 
second statement of Lemma [2] it follows that there is no 
one-state transient communicating class.) As a result, the 


average cost of any NE policy can be obtained by (12 1 . 

Next, let p max := max)/-! p n and t u 
further denote K := |Y max (l -p m »)“ 


:=max„ =1 r„. We 


Theorem 2. Let 


9th ■= 


In (K + 1) - In (K) 
2N (K + 1) ' 


J(7T,x) := limsup ^ i In V?(x),Vx G ¥, (11) 

T—>oo yf J 

where Vfi (x) is as in (8| . 

A stationary policy is one that decides the current control 
(which client to transmit) by the current system state. Thus, 
it can be described by a map / from state space Y to control 
set {1, • ■ • , N}, i.e., the control U(t) = f ( Y(t )). 

We now define the class of Non-Exclusionary (NE) policies 
as those stationary policies which do not serve a client n 
when the system state x is (n, • • ■ , r n _i, 0, r„ + i, ■ • ■ , tjv). It 
is shown in Appendix [X] that for any non-NE policy, either 
there is an NE policy out-performing it, or it is trivial to 
obtain the cost associated with it. Thus, we focus on NE 
policies. 

In the following, we use the standard notations of tran¬ 
sient/ non-transient states and communicating classes 18]. 
For a stationary policy /, let P^ denote its transition prob¬ 
ability matrix, 

(P^x.y := P [Y(t + 1) = y| Y(t) = x, U(t) = /(x)] . 
Further, let L^ denote the dis-utility matrix of /, such that, 

N 

(L / ) x , y := exp(0^1{a: n = tv.}) ■ (P J ) x , y , Vx, y G Y. 

71 = 1 

Let p(l/) be its spectral radius. 

Lemma 2. Consider any NE policy /, 

1) There is exactly one non-transient communicating class, 
which includes the state (n,..., tn). 


For the infinite-horizon MDP-2, 

1) There exists a stationary optimal policy when 9 < 9 t h; 

2) Further, this stationary optimal policy can be computed 
in a finite number of steps. 


Proof. Denote by Tbb(x) the first passage time from state 
x to (n, • • • , tn), i-e., 

Tbb(x) := minjt > 0|F(t) = (n, • • ■ ,rjv), Y(0) = x}. 

We first prove the simultaneous Doeblin condition |T], i.e., 
that for any stationary policy /, 


E/pD b (x)] < K, Vx. (13) 

Note that for any initial state x, the state r will be hit if 
there are Tmax successive transmission failures. The proba¬ 
bility of this event is > (1 — p m ax) Tm “. Consider the prob¬ 
ability that the state r is hit within j¥ max time slots, then, 


E f pbb(x)] < J2f Tn 


(1 -p max ) T 


l=i 


[l-(l-p max ) Tmax ] 


-U- 1 ) 


(1 P m aX ) T 


This proves (|13| ). Consequently, statement 1) is proved by 
combining (|13[) with the Theorem 3.1 in (4j. 

In addition, since we can obtain the average cost of any 
stationary policy (the cost of NE policy by Lemma|2]and the 
cost of a non-NE policy by Appendix[A}, and since there are 
a finite number of possible stationary policies (resulting from 
finite state space and control set), statement 2) follows. □ 






7. THE HIGH RELIABILITY ASYMPTOTIC 
APPROACH 

The risk-sensitive approach faces significant challenges on 
the issue of computational complexity: For MDP-2, denote 
the cardinality of the state space ¥ by |Y|. Then, there 
are in total ¥| ,v policies, each of which requires calculating 
the spectral radius of a [0,l} Y ' x [0,1] ^ matrix to obtain 
the corresponding average cost (see Lemma [2}. Further, the 
classic policy iteration technique [T§] does not apply to this 
risk-sensitive problem, which has a non-irreducible struc¬ 
ture, and has a communicating class changing for different 
policies, and thus one needs to compare the cost over all 
the policies in order to find the optimal policy. In addition, 
since |Y| = mT=i ( T n + 1), the cardinality of the state-space 
is exponential in the number of clients, N. 

However when the channel reliabilities are close to 1, i.e., 
the system of interest is in a high-reliability asymptotic 
regime, and we are able to show that a simple “modified- 
least-time-to-go” (MLG) policy, which is both structurally 
clean and easily implementable, is optimal. We note that 
the high-channel-reliability asymptotic is similar to the high 
SNR asymptotic in network information theory, see [2j. 

In this section, we derive some results useful for analyzing 
the MLG policy in the high-reliability regime. Later, in 
Section 8, we prove the optimality of the MLG policy in the 
case of two clients and propose a low-complexity policy for 
multi-clients which turns out to have good performance in 
the high-reliability regime. 

For ease of exposition, we begin with the simple case of 
two clients sharing an AP. 

7.1 Two-client Scenario and the MLG Policy 

Consider two clients with channel reliabilities pi = 1 —foie, 
P 2 = 1 — £>2e, where e > 0 is a small quantity and foi, 62 > 0. 
Suppose, without loss of generality, that n = r and T 2 = 
r + A, where A > 0. 

Define the modified-least-time-to-go (MLG) policy by, 

/■mlG/ \ _ f 2 ifx = (0,A—1), 

J ' ' } max^ =1 {argmin^-! (r n — *n)} otherwise. 

In words, the policy schedules the client with least t„ — x n , 
i.e., “least time to go”, in most of the states, and breaks 
the ties by selecting the client with larger threshold. There 
is only one exception: When x = (0, A — 1), client 2 is 
scheduled, although T2 — X2 = n + 1 > n — xi = n. 

We will show in Section [8] that this MLG policy is indeed 
optimal when e —> 0. We first explore its properties. 


7.2 Regeneration Cycle 

In the following, by the cost incurred in time slots ti,ti + 
1,... ,t 2 , we mean the quantity, 


n ex p ( ° e t{Y n (t) =t4 

t=t\ 


(14) 


The regeneration point of interest to us is defined as the 
time epoch when the system hits the state (1, 0), i.e., Y ( t ) = 
( 1 , 0 ). 

A regeneration cycle is the time interval between two suc¬ 
cessive regeneration points. For any stationary policy /, 
let u C y C ie be the cost incurred in a regeneration cycle (recall 
(141), and let Z cyc i e be the length of the regeneration cycle. 
Since / is a stationary policy, i; cyc i e and Z cyc i e are random 


| ~ Time slot allocated to client 1 S - The packet transmission succeeds 
Time slot allocatedto client 2 F -The packet transmission fails 



Figure 2: The SS-points and SS-periods are illus¬ 
trated in a two-client scenario. (We arbitrarily allo¬ 
cate the time slots here just to give an example.) 


variables which are i.i.d. in different regeneration cycles. 
Thus, as in renewal theory, we have, 


J (/> x ) = lim l ' -L ln Yr (1- °) 

1 —^ OO C7 J 


= lim InE 

T —Too 8 T 


f (cycle) 

n 


^ cycle 


= lim 

T —>cx 


1 Mr 


(cycle) 


6 T 

1 111 E [^cycle] 


ln E [^cycle] 


, Vx G Y. 


(15) 


6 E [Zcycie] 

where, the first equality follows from ( |11| ); since u^ le de¬ 
notes the cost incurred during the j-th regeneration cycle, 
the second equality follows from the definition of V'y(x) in 
Q with My Cycle ' ) denoting the total number of regeneration 
cycles during T time slots, and the third equality holds since 
«cycie> v i are i.i.d.. 

Result (151 reduces the analysis of the long-term average 


cost to the analysis of the expected cost and expected length 
of a regeneration cycle. Thus, it facilitates the following 
discussions. 


7.3 SS-Point and SS-Period 

Define the SS-points as the time slots such that the packet- 
transmissions in the two successive time slots preceding this 
time-slot are both successful. See the examples in Fig. [2] 
More formally, 

min{f : t > 0 and slots t — 1, t — 2 have 

ss successful transmissions} for j = 1 

min{t : t > r“f t and slots t — 1, t — 2 have 

successful transmissions} for j — 2,3,... 

(16) 

Thus, Tj S is the ji-th SS-point. 

Define the time interval between two successive SS-points 
as an SS-period, and let v ss (x) be the cost ( |14[ ) incurred 
during an SS-period when the system state at the beginning 
of the SS-period is x. Under the application of a stationary 
policy the random variable u ss (x) is i.i.d. across different 
SS-periods for each fixed x. 

It directly follows that, under an arbitrary NE policy (re¬ 
call Appendix [A|, 


P (us S (x) > 1) = O(e), 


(17) 






















Similar results can be obtained for A = 0,1. 


which is obtained by noting that: 

i) u aa (x) > 1 only if for some client n, we have Y n (t) = t„ 
for some time-slot t in this SS-period. 


ii) However, Y n (t) = r n for some time-slot t in this SS- 
period is possible only if the length of this SS-period is 
> 1 (which happens with probability O(e)). 


The statement i) follows the definition of v aa (x), while the 
statement ii) holds because NE policies do not serve client 
2 when the system state is (ti,0), and do not serve client 
1 when the state is (0,T2), and thus any possible starting 
state x of an SS-period satisfies x n < t„, n = 1,2. 

Further, if is the probability that in a regeneration 
cycle (i.e., time between two successive hits of the state 
(1,0), recall Section 7.2 1 , there is at least one SS-period 
in which the starting state is x, then, 


E [WcyCe] = 1 + E P * SS) ( E K (X)] - 1) + o{t k ), (18) 

X 


where k > 1 is an integer such that d\e k < E [v cy cie] — 1 < 
d, 2 t k for some di,d ,2 > 0 when e is sufficient small. This is 
obtained by noting that, 

1) A regeneration cycle consists only of SS-periods. 

2) For any regeneration cycle, v cyc ie > 1 only if at least 
one of the SS-periods included in this cycle has v aa (-) > 

1 . 


3) The probability that two or more SS-periods incur a 
cost Uss(-) > 1 is much less than the probability that 
only one of these SS-periods incurs a cost v aa (-) > 1, 
when e is small enough. The probability of this event 
being small follows ( |17| ). 

In the above, the proof of statements 1) and 2) is direct and 
omitted. It should be noted that the technique of ignoring 
events having relatively small probabilities, as employed in 
the proofs of statement 3) and ( | 18| ) , is frequently used in 
the remaining of this paper. These results facilitate our 
following analyses. 

Now, we consider a regeneration cycle consisting of only 
successful transmissions. Denote X sa as the set of all the 
system states hit during such a regeneration cycle. 

Lemma 3. The following result holds: 

1) For an arbitrary NE policy, 

E [t'cycle] > 1 + E ( E I 143 Wl “ !) + °( £fe )i ( 19 ) 

xGX ss 


where k is an integer such that d\e k < E [u cyc i e ] —1 < d 2 t k 
for some d\,di >0 when e is sufficient small. 

2) When the MLG policy is applied, 

E [u cyc ie] = 1 + E ( E [w S s (x)] — 1 ) + o{e k ), ( 20 ) 

xGX ss 

where k is similarly defined as for ( |19| ). 

3) Further, when the MLG policy is applied, and if A > 2, 

X» = {(1,0),(0,* 2 ),V* 2 =0,--- , A —1} (21) 

E[Z cyc ie] = A + O(e); (22) 

E[v ss (l, 0)] = 1 + b T 1 ~ 1 e T ~ 1 (exp(0) - 1) + 0(e T ), (23) 
E[v S s(0, £ 2 )] = 1 + 0(e T ),Vx2 = 0, ■ • • , A — 1. (24) 


Proof. These results follow from (171, (18 1 , and the defini¬ 


tion of X aa . The proof is straightforward, and the details are 
omitted. □ 

8. ASYMPTOTICALLY OPTIMAL POLICIES 
8.1 The Two-Client Scenario 

Consider the case where there are two-clients sharing an 
AP, with channel reliabilities pi = 1 — bit, P 2 = 1 — fee, and 
thresholds iq = r, T 2 = r + A. Without loss of generality 
assume A > 0. Recall the definition of the MLG policy 
in Section |7.1| and note that fe,fe,e > 0. The following 
theorem establishes the optimality of the MLG policy. 

Theorem 3. The following results hold: 

1) The risk-sensitive cost under MLG policy is, Vx, 

f A 0 e T_1 + 0(e T ) if A = 0 

J(/ MLG , x) = i {t^i+0(e r ) if A = 1 


e -1 lt —1 r —1 


-bl 


+ 0(0 


if A > 2 


where 


A 0 — 


e H - 1 


0 


E b i & r w + 


3=1 




2) The optimal cost J*(x) := min fJ(f, x) has a lower bound, 
J*(x) > 


A 0 e r-1 + o(e T_1 ) 

if A = 0 

^A 1 +- 1 + 0 ( 6 - 1 ) 

if A = 1 

T^e-' + r 1 ) 

if A > 2 


where 

Aq is as in the statement above 
brain := min{fol, 62 } 

Ai ■■= 6I" 1 + (r - 1) blfA 


A 2 := min 


for 1 + (n - l)fo^ 


A ’ A + 1 

ftp 1 + (n -1)60+EJ= 1 1 6 J 3 ftr 1 ~ J 

A + 2 

3) Thus, it follows from 1) and 2) above that the MLG 
policy is optimal in the high reliability asymptotic regime 
(i.e., small e) if any of the following conditions is satisfied: 


(i) A = 0; 


(ii) A = 1, and foi < & 2 ( 


(iii) A > 2 , and b\ 1 < A(r — 1 )foj 1 . 

Proof. We will only consider the case when A > 2, since 
the analyses for the cases when A = 1 or 0 follows similar 
arguments. 

By Lemma [3] under the application of the MLG policy, 
we have, 

E[u cy cie] = 1 + fo[ _ 1 e T_ 1 (exp( 6 ') - l) +0(e T ). (25) 


Thus, statement 1) follows by combining ( |15| ), (22 1 and (251. 

To prove statement 2), we begin by deriving the lower 
bound of E[v aa (x)] for any system state x of the form (•, 0) or 
(0, •). Note that, any possible starting state of an SS-period 












is of this form. In the following, we focus on the analysis of 
E[ii ss (l, 0)], since the analysis of the cost for the SS-period 
starting with any other state follows similar arguments. 

Consider the evolution of system over an SS-period start¬ 
ing with state ( 1 , 0 ) under the application of an arbitrary 
stationary policy. Then we have the following two possibili¬ 
ties, 

(a) The policy serves client 2 before the earlier of these two 
events: i ) a successful packet delivery for client 1 , ii) the 
system hits the value (t, t — 1). Under such a policy, it 
can be shown that a cost u aa (l, 0 ) > 1 is incurred with 
a probability > dt T ~ 2 for some d > 0 . 

(b) The policy does not serve client 2 before the earlier of the 
following two events: i ) a successful packet delivery for 
client 1, ii) the system hits the value (r, r — 1). Then, 
if failures occur in all of the first r — 1 time slots for 
the SS-period, the state (t, t — 1) will be hit. Thus, 
a cost Dbs(1,0) > exp($) is incurred with a probability 
> 

Consequently it follows from a) and b) above, 

E[v aa (l,0)] > 1 + b T 1 ~ 1 e T ~ 1 + o(e T_1 ). 

Similar arguments lead us to conclude the following lower 
bounds on E[v aa (x)] under the application of an arbitrary 
stationary policy, (recall that x should be of the form ( 0 ,-) 
or (•, 0) since it is a possible starting state of an SS-period) 

i. Vx 6 {(0, * 2 ) 1*2 < A - 1}, E[u sa (x)] > 1. 

ii. Vx 6 {(0, * 2 )|*2 > A + 2 } (J {(* 1 , 0)|*i > 2 }, E[v aa (x)] > 
1 4 - de r ~ 2 + o(e T ~ 2 ), with some d > 0 . 

iii. E[v ss (l,0)] > 1 + bl~ 1 e T ~ 1 (exp(0) — 1) + o(e T_1 ). 

iv. E[i! as (0, A + 1)] > 1 + ^ 2^1 1 ’ ,£ ' r 1 + o(e T 1 ). 


on stationary policies. Now, we obtain characterization of 
the optimal policy. 

We define a regeneration point as the time epoch when 
the system hits the state (0,1, • • • ,N — 1), i.e., time t is a 
regeneration point iff. Y(t) = (0,1, • • • , N — 1). (Recall that 
for the case of 2 clients as discussed in Section [7.2| the re¬ 
generation point is the epoch when the state ( 1 , 0 ) is hit.) 
The regeneration cycle is the time interval between two suc¬ 
cessive regeneration points. Now, consider a regeneration 
cycle consisting of only successful transmissions, and denote 
by X s n the sequence of system states hit during such a re¬ 
generation cycle. Note that X s n is a deterministic sequence 
with a given stationary policy. Let |X s n| be the length of 
this sequence, and X a x(j) be the j-th state in this sequence. 

We also define an SN-point as the time slot when the 
packet transmissions in the N successive time slots preced¬ 
ing this time-slot are all successful. (This is similar to the 
definition of SS-point, in ( |16| ) .) The SN-period is the time 
interval between two successive SN-points, and let u s n(x) 
denote the cost ( |14[ ) incurred during an SN-period when the 
system state at the beginning of the SN-period is x, similar 
to the two-client case. 

We further consider a time interval comprising of no less 
than N time-slots, which starts when the system state as¬ 
sumes the value x and ends when the nearest SN-point (such 
that the length of the period > N) is hit. Denote by v s n(x) 
the cost ( |14[ ) incurred during such a period. (Note that this 
is different from u s n(x) because u s n(x) is the cost incurred 
during an SN-period, and that an SN-period may have a 
length strictly less than N. An example of SN-period with 
length 1 is shown in Fig. [ 2 ] for the two-client scenario.) 

Lemma 4. The optimal policy is a member of the set, 
{/: J(/,x) = 0(e),VxG ¥}. 

Further, for any policy / in this set, the following results 
hold: 


v. EK( 0 , A)] > 1 + (r - + o(e T ~ 1 ). 


1) The risk-sensitive cost is, 


By combining these results with the inequality (191 and us¬ 
ing equation (151, we obtain the second statement. 

The third statement is a simple consequence of the first 
two statements. □ 


1 I X snI 

J(/ ’ X) = ix^T £ (E[tto(X. N (j))]-l) +o(e fe ), 
s i=i 

(26) 


In Theorem [3] the first statement characterizes the risk- 
sensitive cost of the MLG policy, while the second provides 
a lower bound on the cost for any stationary policy. The 
third statement provides three sufficient conditions under 
which the MLG policy is asymptotically optimal. These 
conditions are related to the difference between the inter¬ 
delivery thresholds for different clients, and the difference in 
their relative failure probabilities. 

8.2 The General Case: N Clients in the High- 
Reliability Regime 

Now, we consider the general case where there are N 
clients sharing an AP, with the channel reliability of the 
n-th client being p n = 1 — b n e, where e > 0 is a small quan¬ 
tity and b n > 0. The inter-delivery threshold of client n is 
r„. It is assumed that N < n < *2 < • • • < tn, with out 
loss of generality. 

Since Theorem [2] implies that there exists a stationary 
policy that is optimal for the MDP-2, we focus exclusively 


for any system state x, where k > 1 is an integer such 
that d\t k < J(/, x) < d 2 t k for some di, ^2 >0 when e is 
sufficient small. 


2) For any possible starting state x of an SN-period, we 
have, 


iV-l 

E[D s n(x)] = 1 + ^2 (e |u sN (S 2 (x))J ~ l) + o{e k ), 

j=o v 

(27) 

where S 1 (x) is the state that succeeds state x in the event 
of a successful transmission when policy / is applied, i.e., 

S\x) :=(*i + l,- • •, ^(xj-i + U 0, */( x >Hi + 1 , - • •,*jv + l)Ar; 

also S J+ 1 (x) := S ^<S J (x)j ,j = 1, 2, ■ • • ;5°(x) := x, 

and k is an integer such that dit k < E [S s n(x)] — 1 < d 2 t k 
for some di, d 2 > 0 when e is sufficient small. 





Proof. The results are obtained using arguments similar to 
the case of two-clients in the high reliability regime (See 
Section [712| |7.3| equations [15] ( fl7| ) < fl8] ) ( [19] ) ). □ 

One may note that the r.h.s. of \27\ is closely related to 
the r.h.s. of ( |26[ ), by noting that S J (X s n( 1)) = X sN (j + 

1), Vj = 1, ■ • • , |xIn|— 1, and that |X s n| > N holds whatever 
policy is applied. Thus, the following assumption is not 
restrictive. 

Assumption 1. A stationary policy that minimizes E[u s n(x)] 
for each system state x £ Y, also minimizes J(f, x). 


Algorithm 1: SN Policy Algorithm 
input : A, 0, n, • • • ,t n , bi, ■ ■ ■ , b N . 
output: Policy g(x),Vx £ ¥. 

1 Y 0 = {x : 3A > 0, min,, (E[D s n(x)]) = 1 + A + o(l)}; 

2 for each x £ Yo do 

3 A(x) is as in Step |TJ 

4 L s( x ) ^ argmin^Li A(S n (x)) ; 

5 Z <— 0; Y r emain t— 0 

6 foreach ft = 1 to (minOLiTn) do 

7 ZgZU Yfc_i; 

8 Yfc {x : x + 1 £ Yfc_i and x ^ Z}; 

9 repeat Yfc <— Yfc U {x : 5„(x) € 

ZuY fe ,Vn and x ZU Yfc} until Yk not extend; 
10 foreach ft = 1 to (min„ =1 r n ) do 
% 

repeat 

Y" Y' k - Yj(. <- 0; 

foreach x € Y" do 

m ■£- max{j : 3n, 5„(x) € Yj}; 

U se t -f- {n : <S n (x) £ Y m }; 
if m>ft then 

L [4 x ).fl( x )]^- min «et/ S etfcnA((x+l)Ar); 

else if 3n£U S et, A(S n (X-)) not yet then 

L n^Y(Ux; 

else 

[A(x),p(x)] 4 - mm neUset A(Sn(x)) + 

|_ 6nA((x + 1) A r)l{(x+ l)ArG Yfc-i}; 

until Y' fc = 0 or Y' fc = Y'*'; 

if Y^. 7 ^ 0 then 

Yremain 4 — Y re main U Yfcj 4 — 0, Vx G Yfc; 

foreach n— 1 to N — 1 do 
foreach x G Yfc do 

ULt <- p(x) or C/set; 

^(x) G- min neC7 / et 5(5n(x)) + 

_ b n A((x + 1)Ar)l{(x + 1)At G Yfc_i} 

foreach x G Yfc do 

[A(x),#(x)] 4— min n€C/ / et B(5 n (x)) + 
b n A((x. + 1) A r)l{(x + 1 )AtG Yfc_i} 
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Consequently, we design Algorithm [I] above to obtain a 
stationary policy, denoted SN policy , which tends to mini¬ 
mizes E[u s n(x)] for any system state x G Y. Here, 

<S n (x) := (Xl + l, • • ■ ,Xn-l + ljOjXrtfl + l, ■ • -,XJV + 1) A T, 


i.e., the state that succeeds the state x in the event of a 
successful transmission for client n in the MDP-2. This al¬ 
gorithm divides the state space into sets, 

Y 0 ) Yr, • ■ • ,Y min iv =i{rri} , with, 

Yfci={x : 3 A > 0, min (E[u s n(x)]) = l+Ae fe + o(e ,i: )}, (28) 

TV 

for ft = 0,1, • • • , min() ! =1 {r n }. Then it obtains or approxi¬ 
mates the A in [28] for each system state x (denoted A(x)), 
and decides the optimal control based on A(x). 

Theorem 4. When Assumption |T| holds, and Y rem am in 
Algorithm [I] is empty, then the SN policy is optimal in the 
high reliability asymptotic regime. 

Theorem [4] direc tly f ollows from Assumption [T] and the 
design of Algorithm |i | 1 1 The example in Fig. [HJillustrates the 
simulation for a multi-client system when these asymptotic 
conditions are satisfied. 


9. SIMULATIONS 

We now present the results of a simulation study com¬ 
paring several wireless scheduling policies with respect to 
their risk-sensitive average costs. We present the results for 
the scenarios with clients requiring different inter-delivery 
thresholds and under heterogeneous channel reliabilities. 

The wireless scheduling policies implemented include the 
optimal policy (OP) obtained from Theorem[2] the modihed- 
least-time-to-go (MLG) policy (for two-client scenario) pro¬ 
posed in Section |8.1[ and the SN policy proposed in Sec¬ 
tion |8.2| Also, two other heuristic policies are compared: 
the packet-level round-robin policy (PRR), and the largest- 
weighted-delivery-debt (WDD) policy, which serves the client 
with the largest weighted delivery debt, where: 

t M (n) 

Delivery Debt n =-* . 

Pn'Cn Pn 

(Recall mA is the number of packets delivered for the n-th 
client by time t, as in 0 .) The WDD policy has been known 
to be “timely-throughput” optimal (see 10 for discussion). 

Fig. U shows the costs incurred by these four wireless 
scheduling policies for different risk-sensitive parameters. It 
can be seen that the optimal policy always outperforms all 
the other policies. 

pig. a compares the scheduling policies under different 
channel reliabilities in the two-client scenario. It can be 
seen that even when the channel reliability probabilities are 
only moderate, e.g., pi = 0.6 and P 2 = 0.8, the MLG policy 
still achieves almost the optimal cost, and outperforms all 
other greedy policies. 

Fig. E compares the scheduling policies in a multi-client 
scenario. It can be seen that even when the channel reli¬ 
ability probabilities are only moderate, e.g. 0.8, SN pol¬ 
icy still approximates the optimal cost, and outperforms all 
other greedy policies. Here, we also employ the periodic 
scheduling (PS) policy [ 5 ], which is optimal when the fail¬ 
ure probabilities are exactly zero. It can be seen that the 
PS policy performs extremely poorly even when the failure 
probability is very small, e.g., 0.01, since it gives rise to 

x Note that the Algorithm[l]can be further improved by using 
S(Sfx)) in Step 19-22. 





















Figure 3: The risk-sensitive average cost vs. the 
risk-sensitive parameter 8 for different wireless 
scheduling policies is shown. (The parameters are 

N = 2, pi = 0.4, p 2 = 0.1, n = 20, T 2 = 40). 



Figure 4: In two-client scenario, the normalized risk- 
sensitive average cost (normalized by the cost of the 
optimal policy) vs. the failure transmission param¬ 
eter e. (pi = 1 — 2e, p 2 = 1 — e, ti = 3, T 2 = 5, 8 = 0.01.) 

open-loop policies. In contrast, the high-reliability asymp¬ 
totic approach proposed for the scenario with sufficiently- 
small failure probability provides a well performing closed- 
loop scheduling policy. This confirms the value of the high- 
reliability asymptotic approach. 

10. CONCLUSIONS 

In this paper we have addressed the issue of designing 
scheduling policies in order to support inter-delivery require¬ 
ments of wireless clients in cyber-physical systems. A novel 
risk-sensitive approach has been employed to penalize the 
“exceedance” over the allowable thresholds of inter-delivery 
times. 

The resulting MDP that involves infinitely many states 
can be reduced to an equivalent MDP which involves only 
a finite number of states, thus showing that a stationary 
optimal policy exists when the risk-sensitive parameter 9 is 
sufficient small. Based on this, we have designed a finite 
time algorithm to obtain the optimal policy. 

To address the curse of dimensionality from MDP ap¬ 
proach, we proposed a high-reliability asymptotic approach, 
and derived optimal policy for two-client scenario in the 
high-reliability regime. Further, we have designed an SN 
policy for a general number of clients based on our analy- 


(a) Performance of the PRR, WDD, and SN policies 



(b) Performance of Periodic Scheduling 

Figure 5: In a multi-client scenario, the normalized 
risk-sensitive average cost (normalized by the cost 
of the optimal policy) vs. the failure transmission 
parameter e is shown. (The parameters are N = 3, 

Pi = P 2 = P3 = 1 — e, n = 4, t 2 = 6 , T 3 = 8 , 8 = 0.05.) 

sis result. The simulation results show that the proposed 
policies provide near-optimal performance even for moder¬ 
ately large values of the failure probabilities, justifying the 
approach. 
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APPENDIX 

A. NON-EXCLUSIONARY POLICIES ARE 
NOT RESTRICTIVE 

We recall that Non-Exclusionary (NE) policies are those 
stationary policies which do not serve a client n when the 
system state x is (n,--- , r n -i, 0, r n +i, • • • ,tn). We next 
show that for a non-NE, either there is an NE policy out¬ 
performing it, or it is trivial to derive its cost function. 

For a client n, denote x^ na ' as the system state 
(n,--- , t„-i, a, t„ ,tjv) for integer a £ [0,r n ]. Simi¬ 

larly, for n ^ l, denote x ( na ll ia 2 ) ag ^Le s ^ a ^ e x that 
x n = ai, xi = a2, and Xj = Tj,Vj n, l. In a T-horizon 
MDP-2 problem, denote by n the policy which transmits 
client n in the first time slot, and then follows the optimal 
policy. Similarly, n as the policy which transmits client l 
in the first time slot and then follows the optimal policy. 
Then Vf (x), Vjf(x) are the costs associated with these two 
policies with any initial state x, respectively. 

Lemma A.l. If there exists a client l such that pi > p n , 
then for each time slot T, we have Vfl (x*™ 0 ') < Vf (x < - n0 ^). 
That is, if serving client n in state x^” 0 - 1 is optimal, serving 
client l is also optimal. 


Lemma A.2. When p„ = min;pi, if the optimal action 
in state x*'" 0 -’ with T time slots to go is to serve client n, 
then the optimal action in state x ( na \\/a£ {1, • - • ,r„} with 
T time slots to go is also to serve client n. 


The proofs for Lemma | A. 1| and Lemma |A.2| are omitted 
due to space constraints. 

Combining Lemma E3 and Lemma |A.2[ for a non-NE 
stationary policy / which serves client n in state x ( "°\ ei¬ 
ther there exists an NE policy which out-performs / (as in 
Lemma A.l I, or / keeps serving client n (after hitting the 
state t) following Lemma |A.2| and therefore has a trivially 
computable cost. 



